Search CORE

21 research outputs found

SWIFT: Using task-based parallelism, fully asynchronous communication, and graph partition-based domain decomposition for strong scaling on more than 100,000 cores

Author: Agullo E.
Barnes J.
Gonnet P.
Reinders J.
Theuns T.
YarKhan A.
Publication venue: Association for Computing Machinery (ACM)
Publication date: 08/06/2016
Field of study

We present a new open-source cosmological code, called SWIFT, designed to solve the equations of hydrodynamics using a particle-based approach (Smooth Particle Hydrodynamics) on hybrid shared / distributed-memory architectures. SWIFT was designed from the bottom up to provide excellent strong scaling on both commodity clusters (Tier-2 systems) and Top100-supercomputers (Tier-0 systems), without relying on architecture-specific features or specialized accelerator hardware. This performance is due to three main computational approaches: • Task-based parallelism for shared-memory parallelism, which provides fine-grained load balancing and thus strong scaling on large numbers of cores. • Graph-based domain decomposition, which uses the task graph to decompose the simulation domain such that the work, as opposed to just the data, as is the case with most partitioning schemes, is equally distributed across all nodes. • Fully dynamic and asynchronous communication, in which communication is modelled as just another task in the task-based scheme, sending data whenever it is ready and deferring on tasks that rely on data from other nodes until it arrives. In order to use these approaches, the code had to be re-written from scratch, and the algorithms therein adapted to the task-based paradigm. As a result, we can show upwards of 60% parallel efficiency for moderate-sized problems when increasing the number of cores 512-fold, on both x86-based and Power8-based architectures

arXiv.org e-Print Archive

Durham Research Online

Crossref

Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems

Author: Abalenkovs Maksims
Abdelfattah Ahmad
Dongarra Jack
Gates M.
Haidar A
Kurzak Jakub
Luszczek Piotr
Tomov Stanimire
Yamazaki I.
YarKhan A.
Publication venue: 'FSAEIHE South Ural State University (National Research University)'
Publication date: 01/01/2015
Field of study

The University of Manchester - Institutional Repository

The cooperative parallel: A discussion about run-time schedulers for nested parallelism

Author: A YarKhan
D Caballero
E Ayguadé
J Cajas
J Jeffers
J Kurzak
James P. Briggs
L Meadows
MA Serrano
ME Russinovich
R Blikberg
R Nanjegowda
RD Blumofe
VV Dimakopoulos
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Nested parallelism is a well-known parallelization strategy to exploit irregular parallelism in HPC applications. This strategy also fits in critical real-time embedded systems, composed of a set of concurrent functionalities. In this case, nested parallelism can be used to further exploit the parallelism of each functionality. However, current run-time implementations of nested parallelism can produce inefficiencies and load imbalance. Moreover, in critical real-time embedded systems, it may lead to incorrect executions due to, for instance, a work non-conserving scheduler. In both cases, the reason is that the teams of OpenMP threads are a black-box for the scheduler, i.e., the scheduler that assigns OpenMP threads and tasks to the set of available computing resources is agnostic to the internal execution of each team. This paper proposes a new run-time scheduler that considers dynamic information of the OpenMP threads and tasks running within several concurrent teams, i.e., concurrent parallel regions. This information may include the existence of OpenMP threads waiting in a barrier and the priority of tasks ready to execute. By making the concurrent parallel regions to cooperate, the shared computing resources can be better controlled and a work conserving and priority driven scheduler can be guaranteed.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Enabling Workflows in GridSolve: Request Sequencing and Service Trading

Author: A Hurault
A Hurault
Asim YarKhan
Aurèlie Hurault
DC Arnold
F Song
GB Berriman
J Yu
Jack Dongarra
K Seymour
Keith Seymour
M Beck
M Cosnard
P Couvares
T Brady
T Oinn
Y Tanimura
Yinan Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

International audienceGridSolve employs a RPC-based client-agent-server model for solving computational problems. There are two deficiencies associated with GridSolve when a computational problem essentially forms a workflow consisting of a sequence of tasks with data dependencies between them. First, intermediate results are always passed through the client, resulting in unnecessary data transport. Second, since the execution of each individual task is a separate RPC session, it is difficult to enable any potential parallelism among tasks. This paper presents a request sequencing technique that addresses these deficiencies and enables workflow executions. Building on the request sequencing work, one way to generate workflows is by taking higher level service requests and decomposing them into a sequence of simpler service requests using a technique called service trading. A service trading component is added to GridSolve to take advantage of the new dynamic request sequencing. The features described here include automatic DAG construction and data dependency analysis, direct interserver data transfer, parallel task execution capabilities, and a service trading component

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Hal-Diderot

GrADSolve – RPC for High Performance Computing on the Grid

Author: A. YarKhan
A.D. Birrell
Alexandre Denis
C. René
F. Berman
H. Nakada
I. Foster
J. Maassen
K. Seymour
M. Sato
R. Wolski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Crossref

The University of Manchester - Institutional Repository

Scalable, low complexity, and fast greedy scheduling heuristics for highly heterogeneous distributed computing systems

Author: A Al-Qawasmeh
A Ghafoor
A YarKhan
AS Tanenbaum
BC Neuman
Cesar O. Diaz
CO Diaz
CO Diaz
EU Munir
F Pinel
F Pinel
F Xhafa
J Kolodziej
Johnatan E. Pecero
L Wang
L Wang
M Maheswaran
OH Ibarra
P Lindberg
P Luo
Pascal Bouvry
S Ali
S Nesmachnow
S Nesmachnow
TD Braun
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Software

Author: A. Chien
A. Yarkhan
F. Berman
H. Casanova
H. Xia
K. Cooper
Publication venue
Publication date
Field of study

Project is to provide programming tools and an execution environment to ease program development for the Grid. This paper presents recent extensions to the GrADS software framework: a new approach to scheduling workflow computations, applied to a 3-D image reconstruction application; a simple stop/migrate/restart approach to rescheduling Grid applications, applied to a QR factorization benchmark; and a process-swapping approach to rescheduling, applied to an N-body simulation. Experiments validating these methods were carried out on both the GrADS MacroGrid (a small but functional Grid) and the MicroGrid (a controlled emulation of the Grid)

CiteSeerX

A Novel Particle Swarm Optimization Approach for Grid Job Scheduling

Author: A. Abraham
A. Salman
A. YarKhan
D. Liu
I. Foster
V. Martino Di
W. Pang
Y. Gao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref